Transferrable Representations for Visual Recognition
نویسنده
چکیده
Transferrable Representations for Visual Recognition by Jeffrey Donahue Doctor of Philosophy in Computer Science University of California, Berkeley Professor Trevor Darrell, Chair The rapid progress in visual recognition capabilities over the past several years can be attributed largely to improvements in generic and transferrable feature representations, particularly learned representations based on convolutional networks (convnets) trained “end-to-end” to predict visual semantics given raw pixel intensity values. In this thesis, we analyze the structure of these convnet representations and their generality and transferability to other tasks and settings. We begin in Chapter 2 by examining the hierarchical semantic structure that naturally emerges in convnet representations from large-scale supervised training, even when this structure is unobserved in the training set. Empirically, the resulting representations generalize surprisingly well to classification in related yet distinct settings. Chapters 3 and 4 showcase the flexibility of convnet-based representations for prediction tasks where the inputs or targets have more complex structure. Chapter 3 focuses on representation transfer to the object detection and semantic segmentation tasks in which objects must be localized within an image, as well as labeled. Chapter 4 augments convnets with recurrent structure to handle recognition problems with sequential inputs (e.g., video activity recognition) or outputs (e.g., image captioning). Across each of these domains, end-to-end fine-tuning of the representation for the target task provides a substantial additional performance benefit. Finally, we address the necessity of label supervision for representation learning. In Chapter 5 we propose an unsupervised learning approach based on generative models, demonstrating that some of the transferrable semantic structure learned by supervised convnets can be learned from images alone.
منابع مشابه
Robust Visual Knowledge Transfer via EDA
—We address the problem of visual knowledge adaptation by leveraging labeled patterns from source domain and a very limited number of labeled instances in target domain to learn a robust classifier for visual categorization. This paper proposes a new extreme learning machine based cross-domain network learning framework, that is called Extreme Learning Machine (ELM) based Domain Adaptation (EDA...
متن کاملSize-Sensitive Perceptual Representations Underlie Visual and Haptic Object Recognition
A variety of similarities between visual and haptic object recognition suggests that the two modalities may share common representations. However, it is unclear whether such common representations preserve low-level perceptual features or whether transfer between vision and haptics is mediated by high-level, abstract representations. Two experiments used a sequential shape-matching task to exam...
متن کاملInvariant recognition drives neural representations of action sequences
Recognizing the actions of others from visual stimuli is a crucial aspect of human perception that allows individuals to respond to social cues. Humans are able to discriminate between similar actions despite transformations, like changes in viewpoint or actor, that substantially alter the visual appearance of a scene. This ability to generalize across complex transformations is a hallmark of h...
متن کاملA comparison of local versus global imagedecompositions for visual speechreadingMichael
What is the appropriate spatial scale for image representation? In the primate visual system, receptive elds are small at early stages of processing (area V1), and larger at late stages of processing (areas MT, IT). In the current work, we explore the eeciency of local and global image representations on an automatic visual speech recognition task using an HMM as the recognition system. We comp...
متن کاملLearning Transferrable Representations for Unsupervised Domain Adaptation
Supervised learning with large scale labelled datasets and deep layered models has caused a paradigm shift in diverse areas in learning and recognition. However, this approach still suffers from generalization issues under the presence of a domain shift between the training and the test data distribution. Since unsupervised domain adaptation algorithms directly address this domain shift problem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017